Introduction

The EU Horizon SURIMI project aims to deliver a suite of ready-to-use socio-economic and ecological simulation models that are integrated into the EU Digital Twin Ocean. These models leverage a wide range of EU fishery activity data, including datasets from the Fisheries Data Collection Framework (DCF), to support evidence-based policy and sustainable management of marine resources. However, most data are collected in an aggregated form, which limits their spatial resolution and applicability to the fine-scale assessment and simulation of vessel behaviour. For these reasons, one of the objectives of the SURIMI project is to create a set of algorithms to disaggregate the socio-economic dataset at a finer spatial and technical level or even at the level of individual fishing vessels, using public data and online resources (DCF and Global Fishing Watch). 

Summary of the procedure

This procedure (Fig XX) outlines a method to estimate fisheries economic indicators at the level of individual vessels and ICES cells by integrating multiple data sources through a disaggregation methodology. Aggregated data from the EU’s FDI—including effort and landing data —is provided at the ICES cell level and by metier, while the AER database, including economic indicators, is grouped by supra-region, country and fleet segment. Global Fishing Watch (GFW) data is used to provide total fishing hours per vessel and harbour of landings.

Firstly, the FDI data were matched to the Annual Economic Report of Fisheries (AER) according to vessel length, in order to extract the relevant economic indicators. For this example, the focus was on the Gross Value of Landings (GVL); however, any information contained in the AER can be extracted.

In parallel, Global Fishing Watch (GFW) data was used to provide information on the effort (hours of fishing) of each individual vessel and their associated harbours for the selected case study. The MMSI identifier was used to match GFW individual vessels with the EU Fleet Register to assign vessel lengths. Finally, individual vessels were grouped by ICES cell and length to align with the FDI dataset.

Note: Here, we use public GFW data to demonstrate a procedure in cases where access to other types of data is not possible. However, the GFW dataset is restricted to a few fisheries as it only shows those with AIS coverage. However, if more detailed data are available—such as those obtained from the Vessel Monitoring System (VMS)—the procedure can be reliably replicated using that dataset as well.

The FDI-AER and GFW datasets were merged using common identifiers (ICES cell and vessel length). The resulting combined dataset allowed disaggregation of FDI-AER data by individual vessel, computed by dividing total values by the number of vessels in each vessel length and ICES cell. This enabled spatially explicit estimates of GVL at the vessel level.

Procedure scheme
Procedure scheme

1 Protocol - FDI effort by ices cell

In this session, the methodology for disaggregating data will be systematically explained through the application in a specific area that is the GSA09 (Northern Tyrrhenian Sea).

Firstly, to ensure proper data management, it is necessary to download and save the data in a folder specifically dedicated to this purpose.

The data to be downloaded includes:

Data Input
Data Description
FDI effort and landing Effort and landing data divided by year, country, GSA, gear, vessel length and ices-cell. The geographical reference, expressed as ices-cell longitude-latitude coordinates, is provided in the shapefiles.
AER Economic indicators by year, country, GSA, gear, and vessel length.
Fleet register Descriptive information on individual vessels: vessel name, MMSI identifier, vessel length, port of registration, tonnage, power, gear, etc.
EMODNET main ports for the European Seas Main ports’ locations data from 1997 to 2024
FAO ASFIS List of Species for Fisheries The ASFIS (Aquatic Sciences and Fisheries Information System) list for fishery statistics represents the standard taxonomic reference system for the FAO Statistics Team.
FAO Geographical Sub-Areas FAO GFCM area of application, comprised of the Mediterranean and the Black Sea, as Major Fishing Area 37.

Save the data to a folder and set the folder as the data location in the R environment:

wd = "SET YOUR DATA FOLDER DIRECTORY"
library(curl)
library(dplyr)
library(doBy)
library(ggplot2)
library(ggrepel)
library(ggridges)
library(ggspatial)
library(gfwr)
library(gridExtra)
library(gtsummary)
library(leaflet)
library(openxlsx)
library(patchwork)
library(RColorBrewer)
library(reshape2)
library(rnaturalearth)
library(rnaturalearthdata)
library(sf)
library(tidyverse)
library(tidytext)
library(terra)
library(VennDiagram)
library(webr)
library(webshot2)

Data manipulation for a case study area

Users could establish parameters for their case study, which will subsequently inform the procedure.

Here we test Italian Bottom Otter Trawlers (ITA-OTB) in 2021 for GSA09.

CS_name = "FAO GSA09 - Western Med"
Gear_CS = "OTB"
Year_CS = "2021"
Country_CS = "ITA"
Country_code = "IT"
GSAs_CS = "GSA09"
GSAa_CS = "GSA9"

Step 1 - Open and subset FDI data Effort

Open and subset data from FDI by: Gear type, Year, and Country

effort = read.csv(paste0(wd,"FDI_spatial_data_EU28/EU28/spatial_effort_tableau_pts_EU28.csv")) 
effort = effort %>%  
         filter(year %in% Year_CS & gear_type %in% Gear_CS & cscode != "") %>%  
         mutate(totfishdays = as.numeric(totfishdays))

Subsequently, spatial ICES cells were used to map the total effort and landing data.

spatial_effort = read_sf(paste0(wd,"FDI_spatial_data_EU28/EU28/effort_csquares.shp"))
spatial_effort = spatial_effort %>% 
                 filter(cscode != is.na(cscode) & cscode != "")

#Join data 

effort_sf = st_as_sf(left_join(effort, spatial_effort, by = "cscode"))

Then the GSA polygon was used to subset and plot the landing and effort on the map.

# Effort and landing by GSA

GSA = read_sf(paste0(wd,"GSAs_simplified.shp")) %>%
       filter(SECT_COD == GSAs_CS)

effort_GSA = effort_sf %>% 
              filter(sub_region == GSAa_CS)
            
effort_GSA = st_intersection(effort_GSA, GSA)


effort_sf = effort_GSA

CS = GSA

#Set parameter for the map

world <- ne_countries(scale = "medium", returnclass = "sf", continent = "europe")
world = st_transform(world, crs = st_crs(CS))

xmin = as.numeric(st_bbox(effort_sf)[1])-0.1
xmax = as.numeric(st_bbox(effort_sf)[3])+0.1
ymin = as.numeric(st_bbox(effort_sf)[2])-0.1
ymax = as.numeric(st_bbox(effort_sf)[4])+0.1

Total effort coverage for the case study area - resulting from FDI data

eff = ggplot()+
  geom_sf(data = effort_sf, aes(fill = log(totfishdays)), color = NA)+
  scale_fill_viridis_c(option = "A", na.value = "white")+ 
  geom_sf(data = world)+
  coord_sf(xlim = c(xmin, xmax), ylim = c(ymin, ymax))+
  annotation_scale(location = "tl", width_hint = 0.5) +
  annotation_north_arrow(location = "tl", which_north = "true", 
                         pad_x = unit(0.75, "in"), pad_y = unit(0.5, "in"),
                         style = north_arrow_fancy_orienteering) +
  ggtitle(paste0("FDI Effort_",CS_name,"_",Gear_CS,"_",Year_CS))+
  theme_light()+
  theme(legend.position = "bottom")

print(eff)

Save all the filtered data in a specific folder fd = “CaseStudy/Data/”

fd = "CaseStudy/Data/"

write_sf(effort_sf, paste0(fd,"FDI_effort_CS.shp"))

write.csv(
  st_drop_geometry(effort_sf) %>%
    rename(id = cscode, gear = gear_type, vlength = vessel_length, tot_fish_day = totfishdays), paste0(fd,"FDI_effort_CS.csv"), row.names = F)

remove(effort)

Descriptive analysis Effort

FDI_effort_CS = read.csv(paste0(fd,"FDI_effort_CS.csv")) %>% mutate(quarter = as.character(quarter))

ggplot()+
  geom_density_ridges(data = FDI_effort_CS, aes(y = fct_reorder(vlength,tot_fish_day), x = tot_fish_day, fill = vlength),alpha = 0.5)+
  theme_minimal()+
   scale_fill_brewer(palette = "Set1")+

  ylab("Vessel Length")+
  xlab("Total fishing days")+
  ggtitle(paste0("FDI Effort_",CS_name,"_",Gear_CS,"_",Year_CS, "\nby vessel length"))

FDI_CS_data = FDI_effort_CS

Step 2 - Open and subset AER data

AER_FS = read.xlsx(paste0(wd,"STECF_24-07_EU Fleet Economic and Transversal data/STECF 24-07 - EU Fleet Economic and Transversal data_fleet segment level.xlsx"), sheet = 2) %>% 
           filter(year %in% Year_CS & fishing_tech %in% c("DTS") & country_code %in% Country_CS) 

write.csv(AER_FS, paste0(fd,"Economic_data.csv"), row.names = F)

Descriptive analysis AER data

AER_CS = read.csv(paste0(fd,"Economic_data.csv"))

AER_CS = AER_CS %>%  
          rename(vlength = vessel_length) %>% 
          select(c(country_code, year, fishing_tech, vlength, variable_group, variable_name, variable_code, value, unit ))

data_sum = AER_CS %>% 
            group_by(country_code, fishing_tech,vlength, variable_group, variable_name, unit) %>% 
            summarise(val = sum(value,na.rm = T)) %>% 
            rename(gear = fishing_tech)

 data_sum %>% 
   group_by(variable_group, variable_name) %>% 
   mutate(val_prop = val/sum(val)*100) %>% 
   ggplot()+
   geom_bar(aes(y = variable_name, x= val_prop, fill = vlength), stat = "identity")+
   facet_wrap(~ variable_group, scales = "free")+ 
   scale_fill_brewer(palette = "Set1")+
   xlab("")+
   ylab("")+
   facet_wrap(~ gear)+
   theme_classic()

 AER_wide = AER_CS %>% 
            select(-c(variable_code, unit, variable_group)) %>% 
            dcast(...~ variable_name, value.var = "value")

 write.csv(AER_wide, paste0(fd,"Economic_data_wide.csv"), row.names = F)

Step 3 - Join AER with - FDI Effort-Landing

FDI_CS_data = read.csv(paste0(fd,"FDI_effort_CS.csv")) %>% mutate(quarter = as.character(quarter))
AER_CS = read.csv(paste0(fd,"Economic_data.csv")) %>% rename(vlength = vessel_length)


FDI_sub = unique(FDI_CS_data[,c("year","quarter", "vlength", "id", "tot_fish_day" )])
AER_sub = AER_wide[,c("year", "vlength", "Fishing days", "Days at sea")]

Step 4 - Find vessels track by Global Fishing Watch (GFW)

Extrapolate data

In this step, we will identify all vessels present in the CS area in a defined moment (here, we use the year 2021 as an example). The vessels were extrapolated from the GFW dataset, which uses AIS data to identify vessel tracks, fishing areas, and zones of navigation. Furthermore, it has the capacity to identify the ports visited by individual vessels. For more datails see https://globalfishingwatch.org/our-apis/

The use of gfwr requires a GFW API token, which users can request from the GFW API Portal. Save this token to your .Renviron file using usethis::edit_r_environ() and adding a variable named GFW_TOKEN to the file (GFW_TOKEN=“PASTE_YOUR_TOKEN_HERE”). Save the .Renviron file and restart the R session to make the edit effective.

gfwr functions are set to use key = gfw_auth() by default so in general you shouldn’t need to refer to the key in your function calls.

key = gfw_auth() 
CS_polygon <- sf::st_bbox(c(xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax),
  crs = 4326) |>
  sf::st_as_sfc() |>
  sf::st_as_sf()

GFW_effort = get_raster(spatial_resolution = 'LOW',
                 temporal_resolution = 'MONTHLY',
                 group_by = 'VESSEL_ID',
                 start_date = "2021-01-01",
                 end_date = "2021-12-31",
                 region = CS_polygon,
                 region_source = 'USER_SHAPEFILE',
                 key = key)

colnames(GFW_effort) = make.names(colnames(GFW_effort))

GFW_effort %>% 
  group_by(Flag,Gear.Type) %>% 
  summarise(h = sum(Apparent.Fishing.Hours)) %>% 
  ggplot()+
  geom_bar(aes(x = h, y = reorder(Gear.Type, h), fill = Flag), stat = "identity")+
  ggtitle("GFW data from CS polygon")+
  xlab("Fishing hours")+
  ylab("Gear type")+
  theme_light()

write.csv(GFW_effort, paste0(fd,"GFW_effort_tot_CS.csv"))

Subset by Country, Gear, and CS Area

FDI_effort_CS_sf = read_sf(paste0(fd,"FDI_effort_CS.shp"))
GFW_effort = read.csv(paste0(fd,"GFW_effort_tot_CS.csv"))

GFW_effort_CS_sf = GFW_effort %>%
                     filter(Flag == "ITA" & Gear.Type %in% c("TRAWLERS")) %>% 
                     st_as_sf(coords = c("Lon", "Lat"), crs = 4326) 

GFW_effort_CS_sf$month = as.integer(substr(GFW_effort_CS_sf$Time.Range, 6,7))
GFW_effort_CS_sf$quarter = as.character(c(1,2,3,4)[findInterval(GFW_effort_CS_sf$month, c(1,3,6,9,13))])

Performs spatial joins quarter-by-quarter to ensure temporal alignment

FDI_effort_CS_sf_by_quarter = FDI_effort_CS_sf %>%
                      group_by(quarter,cscode, ger_typ) %>%
                      summarise(FDI_tot_fish_day_by_ICES = sum(ttfshdy))

quarter = c("1","2","3","4")

GFW_effort_CS_sf_grid <- NULL  
for(i in 1:length(quarter)) {
  a <- st_join(
    GFW_effort_CS_sf[which(GFW_effort_CS_sf$quarter %in% quarter[i]), ],
    FDI_effort_CS_sf_by_quarter[which(FDI_effort_CS_sf_by_quarter$quarter %in% quarter[i]), "cscode"],
    left = T
  )

  GFW_effort_CS_sf_grid <- rbind(GFW_effort_CS_sf_grid, a)
}
  
GFW_effort_CS_sf_grid = GFW_effort_CS_sf_grid %>% 
                        filter(!is.na(cscode)) %>% 
                        rename(id = cscode) 


write_sf(GFW_effort_CS_sf_grid, paste0(fd,"GFW_effort_CS_sf_grid.shp"))
write.csv(
  st_drop_geometry(GFW_effort_CS_sf_grid),paste0(fd,"GFW_effort_CS_sf_grid.csv"), row.names = F)

Step 5 - Find port visited by Global Fishing Watch (GFW)

Download port visited by GFW database Only for CS vessels.

Please note that this phase can take a long time.

vID = unique(GFW_effort_CS_sf_grid$Vessel.ID)



# Initialize port_FV with correct column types
port_FV <- data.frame(
  port = character(), 
  lat = numeric(), 
  lon = numeric(), 
  vessel_name = character(),
  MMSI = character(),
  month = character(),
  stringsAsFactors = FALSE
)

for (i in 1:length(vID)) {
  
  port_event <- get_event(
    event_type   = 'PORT_VISIT',
    start_date   = "2021-01-01",
    end_date     = "2021-12-31",
    region       = CS_polygon,
    vessels      = vID[i],
    region_source= 'USER_SHAPEFILE',
    key          = key
  )
  
  if (is.null(port_event)) next
  
  for (j in 1:nrow(port_event)) {
    
    # Extract values safely, replacing NULL with NA
    port_name <- port_event$event_info[[j]]$startAnchorage$name
    lat <- port_event$event_info[[j]]$startAnchorage$lat
    lon <- port_event$event_info[[j]]$startAnchorage$lon
    vessel_name <- port_event$vessel_name
    MMSI <- port_event$vessel_ssvid
    month <- as.character(month(port_event$start))
    

    # Create the data frame with NULL-safe values
    port_event_df <- data.frame(
      port = ifelse(length(port_name) == 0, NA, port_name),
      lat = ifelse(length(lat) == 0, NA, lat),
      lon = ifelse(length(lon) == 0, NA, lon),
      vessel_name = ifelse(length(vessel_name) == 0, NA, vessel_name),
      MMSI = ifelse(length(MMSI) == 0, NA, MMSI),
      month = ifelse(length(month) == 0, NA, month),

      stringsAsFactors = FALSE
    )
    
    # Append the row to the result dataframe
    port_FV <- bind_rows(port_FV, port_event_df)
  }
}

# Remove duplicates and drop rows with NA values
port_CS_OTB <- port_FV %>% 
  unique() %>% 
  drop_na() %>% 
  mutate(quarter = case_when(
          month %in% c("1", "2", "3") ~ "1",
          month %in% c("4", "5", "6") ~ "2",
          month %in% c("7", "8", "9") ~ "3",
          month %in% c("10", "11", "12") ~ "4"  )) %>% 
  group_by(port, vessel_name, MMSI, quarter) %>% 
  summarise(lat = mean(lat), lon = mean(lon))



write.xlsx(port_CS_OTB, paste0(fd,"GFW_port_CS.xlsx"))

Descriptive analysis

GFW_port_CS = read.xlsx(paste0(fd,"GFW_port_CS.xlsx"))

GFW_port_CS %>% 
  group_by(port) %>%
  summarise(
    lon = mean(lon, na.rm = TRUE),
    lat = mean(lat, na.rm = TRUE),
    nvessel = n()) %>% 
 
  ggplot()+
  geom_bar(aes(y = reorder(port,nvessel) , x = nvessel), stat = "identity")+
  theme_light()+
  ggtitle("Harbour by number of vessels")+
  xlab("Proportion of number of vessels")+
  ylab("")

Open Fleet Register and Add Vessel length (LOA) by MMSI - Vessel name for GFW data

Take in effort only MMSI of the fleet register: we take only vessels present also in the Fleet Register, and we add the vessel length.

fleetReg = read.csv(paste0(wd,"vesselRegistryListResults.csv"), sep = ";")

fleetReg[fleetReg$Main.fishing.gear %in% c("TBN","OTS", "TBS", "OT"), "Main.fishing.gear"] <- "OTB"
fleetReg[fleetReg$Main.fishing.gear %in% c("SV","SX"), "Main.fishing.gear"] <- "SDN"
fleetReg[fleetReg$Main.fishing.gear %in% c("DRM", "DRH") , "Main.fishing.gear"] <- "DRB"
fleetReg[fleetReg$Main.fishing.gear %in% c("GTN","GNC", "GN", "FIX"), "Main.fishing.gear"] <- "GNS"
fleetReg[fleetReg$Main.fishing.gear %in% "SPR", "Main.fishing.gear"] <- "SSC"
fleetReg[fleetReg$Main.fishing.gear %in% c("SB", "NK"), "Main.fishing.gear"] <- "MIS"
fleetReg[fleetReg$Main.fishing.gear %in% c("LL", "LX"), "Main.fishing.gear"] <- "LLS"


fleetReg_info = fleetReg %>% 
                 rename(vessel_name = "Name.of.vessel", Gear = "Main.fishing.gear", Country ="Country.of.Registration") %>% 
                  mutate(MMSI = as.character(MMSI)) %>% 
                  filter(Country %in% Country_CS) %>% 
                  filter(Gear %in% Gear_CS) 


fleetReg_info$vlength = c("VL0006","VL0612","VL1218", "VL1824", "VL2440", "VL40XX" )[findInterval(fleetReg_info$LOA, c(0,06,12,18,24,40, 100))]

write.csv(fleetReg_info, paste0(fd,"FleetReg_info_CS.csv"), row.names = F)

We lost ~ 50 vessels because they do not have MMSI associated to the fleet register

fleetReg_info = read.csv(paste0(fd,"FleetReg_info_CS.csv")) %>% 
                select(vessel_name, MMSI, Gear, vlength) %>% 
                mutate(MMSI = as.character(MMSI)) %>% 
                rename(name_vreg = vessel_name)

GFW_port_CS_fReg = GFW_port_CS %>% 
                  left_join(fleetReg_info, by = "MMSI") %>% 
                  filter(MMSI %in% fleetReg_info$MMSI)
                    

GFW_effort_CS_sf_grid_fReg = read.csv(paste0(fd,"GFW_effort_CS_sf_grid.csv")) %>% 
                              mutate(MMSI = as.character(MMSI)) %>% 
                              left_join(fleetReg_info, by = "MMSI") %>% 
                              filter(MMSI %in% unique(fleetReg_info$MMSI))


write.csv(GFW_effort_CS_sf_grid_fReg, paste0(fd,"GFW_effort_CS_sf_grid_fReg.csv"))
write.csv(GFW_port_CS_fReg, paste0(fd,"GFW_port_CS_fReg.csv"), row.names = F)

Check for main ports

We filtered for main ports resulted fro the: Main Ports (Vessels Traffic by Type 1997-2024)

Since the two datasets are not perfectly comparable, we first identify all the GFW ports that are also present in the EMODNET dataset by performing a join on the port name. Then, a buffer of 3 km is created around the EMODNET ports, and the GFW ports within that buffer are assigned the same name as the EMODNET ports.

During this phase, it is essential to consult a Case Study expert who will be able to manually modify the port name in the GFW dataset, where feasible.

CS_polygon <- sf::st_bbox(c(xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax),
  crs = 4326) |>
  sf::st_as_sfc() |>
  sf::st_as_sf()

GFW_port_CS_fReg = read.csv(paste0(fd,"GFW_port_CS_fReg.csv"))

EMODNET_port_sf = read_sf(paste0(wd,"EMODnet_HA_MainPorts_Traffic_20241112/EMODnet_HA_MainPorts_Ports2025_20241112.shp")) %>% 
  filter(CNTR_CODE %in% Country_code) %>% 
  st_intersection(CS_polygon)


GFW_port_CS = GFW_port_CS_fReg 
                

GFW_port_sf = GFW_port_CS %>% st_as_sf(coords = c("lon","lat"), crs = st_crs(EMODNET_port_sf))

EMODNET_port_sf$PORT_NAME = toupper(EMODNET_port_sf$PORT_NAME)

Emo_port = EMODNET_port_sf$PORT_NAME
GFW_port = unique(GFW_port_sf$port)

Analysed the common port names between the Emodnet and GFW datasets and change name when is possible.

# Generate initial Venn diagram
v <- venn.diagram(
  list(Emo_port = Emo_port, GFW_port = GFW_port),
  fill = c("orange", "blue"),
  alpha = c(0.5, 0.5), 
  cat.cex = 1.0, 
  cex = 1.0,
  filename = NULL
)

# Inspect original labels (optional, can be commented out)
# lapply(v, function(i) i$label)

# Customize labels - these indices (5-7) may need adjustment based on your data
v[[5]]$label <- paste(setdiff(GFW_port, Emo_port), collapse = "\n")  # GFW_port only
v[[6]]$label <- paste(setdiff(Emo_port, GFW_port), collapse = "\n")  # Emo_port only
v[[7]]$label <- paste(intersect(Emo_port, GFW_port), collapse = "\n") # Intersection

# Render the plot
grid.newpage()
grid.draw(v)

GFW_port_sf = GFW_port_sf %>% 
               mutate(port = case_when(
                  port == "CALA GALERA" ~ "PORTO ERCOLE",
                  port == "GENOA" ~ "GENOVA",
                  port == "GIGLIO PORTO" ~ "ISOLA DEL GIGLIO",
                  port == "PORTO FERRAIO" ~ "PORTOFERRAIO",
                  TRUE ~ port  # Keep other values unchanged
                ))

GFW_port = unique(GFW_port_sf$port)

Portdiff = setdiff(Emo_port, GFW_port)



EMODNET_port_buffer_GFW = EMODNET_port_sf %>% 
                          filter(PORT_NAME %in% Portdiff) %>% 
                          st_buffer(dist = 3000) %>% 
                          st_intersection(GFW_port_sf)
                          
# EMODNET_port_buffer_GFW              
###inspect manually 


GFW_port_sf = GFW_port_sf %>% 
                mutate(port = case_when(
                      port == "ITA-110" ~ "RIO MARINA",
                      port == "ITA-167" ~ "CAVO",
                      TRUE ~ port ))


Emo_port = EMODNET_port_sf$PORT_NAME
GFW_port = unique(GFW_port_sf$port)
# Generate plot
v <- venn.diagram(list(Emo_port = Emo_port, GFW_port = GFW_port),
                  fill = c("orange", "blue"),
                  alpha = c(0.5, 0.5), cat.cex = 1.0, cex=1.0,
                  filename=NULL)

# lapply(v, function(i) i$label)


v[[5]]$label  <- paste(setdiff(GFW_port, Emo_port), collapse="\n")  
# in baa only
v[[6]]$label <- paste(setdiff(Emo_port, GFW_port)  , collapse="\n")  
# intesection
v[[7]]$label <- paste(intersect(Emo_port, GFW_port), collapse="\n")  

# plot  
grid.newpage()
grid.draw(v)

Finally, double-check the GFW ports without a unique identifier and assign the correct nearest port name to each. See the leaflet map as an example.

GFW ports without ID are in red, while correct ports are in blue.

final_port_name = intersect(Emo_port, GFW_port)

xmin = as.numeric(st_bbox(FDI_effort_CS_sf_by_quarter)[1])-0.1
xmax = as.numeric(st_bbox(FDI_effort_CS_sf_by_quarter)[3])+0.1
ymin = as.numeric(st_bbox(FDI_effort_CS_sf_by_quarter)[2])-0.1
ymax = as.numeric(st_bbox(FDI_effort_CS_sf_by_quarter)[4])+0.1

  leaflet() %>%
  addTiles() %>%
  fitBounds(lng1 = xmin, lat1 = ymin, lng2 = xmax, lat2 = ymax) %>%
  
  # All GFW ports (red)
  addCircleMarkers(
    data = GFW_port_sf,
    radius = 5,
    color = "red",
    popup = ~ port
  ) %>%
  
  # Highlighted ports (blue)
  addCircleMarkers(
    data = GFW_port_sf[which(GFW_port_sf$port %in% final_port_name),],
    radius = 5,
    color = "blue",
    popup = ~ port
  ) %>%
  
  # Final ports (green)
  addMarkers(
    data = EMODNET_port_sf,
    popup = ~ PORT_NAME
  )
  GFW_port_sf = GFW_port_sf %>% 
              mutate(port = case_when(
                      port == "ITA-338" ~ "ANZIO",
                      port == "ITA-206" ~ "CIVITAVECCHIA",
                      port == "SANTA MARINELLA" ~ "CIVITAVECCHIA",
                      port == "ITA-368" ~ "CIVITAVECCHIA",
                      port == "ITA-297" ~ "PORTO ERCOLE",
                      port == "ITA-115" ~ "PORTOFERRAIO",
                      port == "MARCIANA MARINA" ~ "PORTOFERRAIO",
                      port == "MARINA DI SALIVOLI" ~ "PIOMBINO",
                      port == "PORTOVENERE" ~ "LA SPEZIA",
                      port == "LAVAGNA" ~ "CHIAVARI",
                      port == "SESTRI LEVANTE" ~ "CHIAVARI",
                      port == "CAMOGLI" ~ "GENOVA",
                      port == "ITA-273" ~ "GENOVA",
                      port == "VARAZZE" ~ "SAVONA",
                      port == "SAN LORENZO AL MARE" ~ "IMPERIA",
                      port == "SANTO STEFANO" ~ "IMPERIA",
                      port == "SANREMO" ~ "IMPERIA",
                      TRUE ~ port )) 
                
  
  
  
  GFW_port = unique(GFW_port_sf$port)
  final_port_name = intersect(Emo_port, GFW_port)
  
  EMODNET_final_port = EMODNET_port_sf %>% select(PORT_NAME) %>% filter(PORT_NAME %in% final_port_name) %>% rename(port = PORT_NAME)

  
  GFW_port_sf = st_drop_geometry(GFW_port_sf) %>% 
                left_join(EMODNET_final_port) %>% 
                st_as_sf()
 
  GFW_port_sf %>% 
    mutate(n = 1) %>% 
    group_by(port) %>%
    summarise(nvessel = sum(n)) %>%
    ggplot()+
  geom_bar(aes(y = reorder(port,nvessel) , x = nvessel), stat = "identity")+
  # geom_vline(xintercept = 10, color = "red")+
  theme_light()+
  ggtitle("Harbour by number of vessels")+
  xlab("Proportion of number of vessels")+
  ylab("")

write_sf(GFW_port_sf, paste0(fd,"GFW_port_CS_fReg_coords.shp"))  
write.csv(data.frame(st_coordinates(GFW_port_sf), st_drop_geometry(GFW_port_sf)), row.names = F, paste0(fd,"GFW_port_CS_fReg_coords.csv"))

We have now obtained the number of vessels for the main ports, and we must link them to the effort. However, we are unable to retain individual boat information because some vessels are associated with many different ports. So we make last modification:

GFW_port_CS = read.csv(paste0(fd,"GFW_port_CS_fReg_coords.csv")) %>% mutate(MMSI = as.character(MMSI))

fleetReg_place_reg = read.csv(paste0(fd,"FleetReg_info_CS.csv")) %>%  
                    select(MMSI, Place.of.registration.name) %>% 
                    mutate(MMSI = as.character(MMSI))

GFW_port_fREG = GFW_port_CS %>% left_join(fleetReg_place_reg)
GFW_port_fREG <- GFW_port_fREG %>%
                  group_by(MMSI) %>%
                  mutate(
                    port_count = n_distinct(port),
                    match_port = if_else(port_count > 1, Place.of.registration.name, port),
                    final_port = if_else(is.na(match_port), port, match_port )
                  ) %>%
                  mutate(final_port = toupper(final_port)) %>% 
                  ungroup() %>%
                  select(-port_count)  

setdiff(unique(GFW_port_fREG$final_port), unique(GFW_port_fREG$port))
## [1] "SESTRI LEVANTE"          "SANTA MARGHERITA LIGURE"
## [3] "ROMA"                    "SAN REMO"
 GFW_port_fREG <- GFW_port_fREG %>%
          mutate(final_port = case_when(
                      final_port == "ROMA" ~ "FIUMICINO",
                      final_port == "SAN REMO" ~ "IMPERIA",
                      TRUE ~ final_port ))
 
 setdiff(unique(GFW_port_fREG$final_port), unique(GFW_port_fREG$port))  
## [1] "SESTRI LEVANTE"          "SANTA MARGHERITA LIGURE"
 GFW_port_fREG %>% 
   filter(final_port %in% c("SESTRI LEVANTE" , "SANTA MARGHERITA LIGURE")) %>% 
   distinct(vessel_name,final_port)
## # A tibble: 4 × 2
##   vessel_name  final_port             
##   <chr>        <chr>                  
## 1 JAZZ         SESTRI LEVANTE         
## 2 ARDITO       SANTA MARGHERITA LIGURE
## 3 SGIUSEPPE    SESTRI LEVANTE         
## 4 TERESA MADRE SANTA MARGHERITA LIGURE
  GFW_port_fREG <- GFW_port_fREG %>%
          mutate(final_port = case_when(
                      final_port == "SESTRI LEVANTE" & vessel_name == "JAZZ" ~ "CHIAVARI", 
                      final_port == "SANTA MARGHERITA LIGURE" & vessel_name == "ARDITO" ~ "CHIAVARI",
                      final_port == "SANTA MARGHERITA LIGURE" & vessel_name == "TERESA MADRE" ~ "CHIAVARI",
                      final_port == "SESTRI LEVANTE" &  vessel_name == "SGIUSEPPE" ~ "CHIAVARI",
                      TRUE ~ final_port ))
 
  setdiff(unique(GFW_port_fREG$final_port), unique(GFW_port_fREG$port)) 
## character(0)
  unique(GFW_port_fREG$final_port)
##  [1] "ANZIO"               "PORTO ERCOLE"        "PORTO SANTO STEFANO"
##  [4] "CIVITAVECCHIA"       "GENOVA"              "CHIAVARI"           
##  [7] "FIUMICINO"           "PIOMBINO"            "LIVORNO"            
## [10] "PORTOFERRAIO"        "SAVONA"              "IMPERIA"            
## [13] "PORTOFINO"           "LA SPEZIA"           "VIAREGGIO"
port_coords = unique(GFW_port_fREG[c("X","Y","port")]) %>% 
              rename(lon_port = X,
                     lat_port = Y)

GFW_port_fREG_CS = GFW_port_fREG %>% 
                    select(-c(X,Y,port,match_port)) %>% 
                    rename(port = final_port) %>% 
                    left_join(port_coords)
  

GFW_port_fREG_CS_sf = GFW_port_fREG_CS %>%
   select(port,lon_port,  lat_port) %>% 
   unique() %>% 
   st_as_sf(coords = c("lon_port", "lat_port"), crs = st_crs(GSA))

 ggplot() +
  geom_sf(data = world)+
  geom_sf(data = GFW_port_fREG_CS_sf)+
  geom_label_repel(
      data = GFW_port_fREG_CS %>% select(port,lon_port,  lat_port) %>% 
   unique() ,
      aes(x = lon_port, y = lat_port, label = port),
      size = 3,
      min.segment.length = 0
    )   +
  coord_sf(xlim = c(xmin, xmax), ylim = c(ymin, ymax))+
  theme_minimal()+
  ggtitle("Map of main port for the CS")+
  xlab("Longitude")+
  ylab("Latitude")

  write.csv(GFW_port_fREG_CS, paste0(fd,"GFW_port_CS_fReg_coords.csv"), row.names = F)   
  GFW_port_fREG_sf = st_as_sf(GFW_port_fREG_CS, coords = c("lon_port","lat_port"), crs = 4326)
  
  write_sf(GFW_port_fREG_sf, paste0(fd,"GFW_port_CS_fReg_coords.shp")) 

Find the number of vessels by each port

Join vessels by port to find how many vessels are in each port

–> Remove quarter: The data is aggregated by year and the seasonal variation is removed because the AIS data in this case study does not have optimal resolution.

GFW_port_fREG = read.csv(paste0(fd,"GFW_port_CS_fReg_coords.csv")) %>% 
              select(MMSI, Gear, vlength, port, lon_port, lat_port) %>% 
                   mutate(MMSI = as.character(MMSI)) %>% 
                    unique()

GFW_effort_CS_sf = read_sf(paste0(fd,"GFW_effort_CS_sf_grid.shp")) %>% 
  filter(MMSI %in% unique(GFW_port_fREG$MMSI)) %>% 
  dplyr::select(MMSI, App_F_H, id) %>% 
  mutate(MMSI = as.character(MMSI))


GFW_effort_CS_sf =GFW_effort_CS_sf  %>% 
  left_join(GFW_port_fREG, by = "MMSI")

write_sf(GFW_effort_CS_sf, paste0(fd,"GFW_effort_port_by_icell.shp"))

Step 6 - Disaggregation process

AER disaggregation by ICES cell

Starting from the data collected by NISEA, it is possible to fit a series of economic models.

Compute the disaggregated economic value for each ices cell:

\[ Effort~shared = \frac{Fishing~days}{Total~fish~days~by~Vessel~length} \]

\[ GVL~by~cell = GVL \times Effort~shared \]

AER = read.csv(paste0(fd,"Economic_data_wide.csv")) %>% rename(vssl_ln = vlength) %>% 
  dplyr::select(c("vssl_ln", "Fishing.days", "Energy.costs", "Gross.value.of.landings"))
  
FDI_effort_CS_sf = read_sf(paste0(fd,"FDI_effort_CS.shp"))

FDI_id = FDI_effort_CS_sf %>% 
          group_by(cscode, ger_typ, vssl_ln ) %>% 
          summarise(day = sum(ttfshdy)) %>% 
          rename(id = cscode)

FDI_vl = FDI_id %>% 
          group_by(ger_typ,vssl_ln) %>% 
          mutate(tot_day = sum(day)) %>% 
          st_drop_geometry() %>% 
          ungroup()

FDI_vl <- FDI_vl %>%
  group_by(ger_typ) %>% 
  mutate(effort_share = day / tot_day)

FDI_vl <- FDI_vl %>%
          left_join(AER, by = c("vssl_ln"))

FDI_vl <- FDI_vl %>%
  mutate(across(c(Fishing.days, Energy.costs, Gross.value.of.landings), ~ .x * effort_share, .names = "{.col}_AER")) %>% 
  dplyr::select(-c(Fishing.days,Energy.costs,Gross.value.of.landings))


write.csv(FDI_vl, paste0(fd,"FDI_AER_by_icell.csv"), row.names = F)

Add GFW data by gear and vessel length

GFW_effort_port_by_icell = read_sf(paste0(fd,"GFW_effort_port_by_icell.shp")) %>% 
  rename(vssl_ln = vlength)

FDI_AER_by_icell = read.csv(paste0(fd,"FDI_AER_by_icell.csv"))



GFW_id = GFW_effort_port_by_icell %>% 
          group_by(id, Gear, vssl_ln) %>%  
          summarise(h = sum(App_F_H)) %>% 
          st_drop_geometry()

FDI.AER_id = FDI_AER_by_icell %>% 
          group_by(id,  ger_typ, vssl_ln ) %>% 
          summarise(day = sum(Fishing.days_AER)) 

### check data effort of FDI and GFW

FDI_GFW = inner_join(GFW_id, FDI_id)                   


FDI_GFW_plot = ggplot(data = FDI_GFW, aes(x = day, y = h))+
  geom_point()+
  geom_smooth(method = "lm")+
  theme_bw()+
  xlab("FDI-AER - Fishing days \nby ices cell")+
  ylab("GFW - Fishing hours \nby ices cell")

FDI_GFW_plot

FDI effort - AER disaggregation by spatial ICES cells

Since there is a linear relationship between GFW effort and FDI fishing days, we can simply divide the FDI data x cell x the number of vessels in a given cell

\[ FDI\text{-}AER~by~vessel = \frac{FDI\text{-}AER~data}{number~of~vessels~by~length~and~ices~cell} \]

gfw_df = GFW_effort_port_by_icell %>% 
          group_by(id, vssl_ln) %>% 
          mutate(n_track = n()) %>% 
          mutate(n_prop = n_track * sum(n_track)) %>% 
          ungroup()

gfw_fdi_merged_df <- gfw_df %>%
                      left_join(FDI_AER_by_icell, by = c("id", "vssl_ln"))

final_df <- gfw_fdi_merged_df %>%
    mutate(across(c(Fishing.days_AER, Energy.costs_AER, Gross.value.of.landings_AER), ~ .x / n_track, .names = "{.col}_by_vessel")) %>% 
  dplyr::select(-c(Fishing.days_AER,Energy.costs_AER,Gross.value.of.landings_AER))

Output

Check the disaggregation process

print(final_df %>% 
          group_by(id, vssl_ln) %>% 
          summarise(GFW_H = sum(App_F_H),FDI_days = sum(Fishing.days_AER_by_vessel)) %>% 

ggplot(aes(x = GFW_H, y = FDI_days))+
  geom_point()+
  geom_smooth(method = "lm")+
  theme_bw()+
  facet_wrap(~ vssl_ln, scale = "free")+
  xlab("GFW Fishing hours \nby ices cell")+
  ylab("Disaggregate FDI-AER fishing days\nby ices cell")+
  ggtitle("Check the disaggregation process")
)

Map the Gross Value of Landings (GVL) resulting from the disaggregation process

map_GVL = ggplot()+
  geom_sf(data = final_df, aes(color = log(Gross.value.of.landings_AER_by_vessel)))+
  scale_color_viridis_c("log(GVL)",option = "D",  na.value = "white")+ 
  geom_sf(data = FDI_id, fill = "NA")+
  geom_sf(data = world)+
  coord_sf(xlim = c(xmin, xmax), ylim = c(ymin, ymax))+
  annotation_scale(location = "tl", width_hint = 0.5) +
  annotation_north_arrow(location = "tl", which_north = "true", 
                         pad_x = unit(0.75, "in"), pad_y = unit(0.5, "in"),
                         style = north_arrow_fancy_orienteering) +
  ggtitle(paste0("Disaggregate GVL data result - point"))+
  theme_light()

map_GVL

final_df_by_ices = FDI_id %>%
  select(geometry) %>% 
  st_join(final_df) %>% 
   ggplot +  
  geom_sf(aes(fill = log(Gross.value.of.landings_AER_by_vessel)))+
  geom_sf(data = world)+
  scale_fill_viridis_c("log(Gross Value of Landings)",option = "inferno",  na.value = "white")+ 
  coord_sf(xlim = c(xmin, xmax), ylim = c(ymin, ymax))+
  annotation_scale(location = "tl", width_hint = 0.5) +
  annotation_north_arrow(location = "tr", which_north = "true", 
                         pad_x = unit(0.1, "in"), pad_y = unit(0.1, "in"),
                         style = north_arrow_fancy_orienteering) +
  ggtitle(paste0("Disaggregate GVL data result - ices cells"))+
  theme_light()+
  theme(legend.position = "bottom")

final_df_by_ices

Gross Value of landings by port

n_port = final_df %>% 
         st_drop_geometry() %>% 
                select(MMSI, vssl_ln, port, Gear) %>% 
                unique() %>%
                group_by(Gear, vssl_ln, port) %>%
                summarise(n_vessel_port = n())


nvessel_weighted <- n_port %>%
                 group_by(Gear, port) %>% 
                 mutate(weight = n_vessel_port / sum(n_vessel_port)) 

GVL_by_vlength = FDI_AER_by_icell %>% 
                 group_by(ger_typ, vssl_ln) %>% 
                 summarise(GVL = mean(Gross.value.of.landings_AER)) %>%                  rename(Gear = ger_typ)  
                      
FDI_AER_nvessel <- GVL_by_vlength %>%
                         inner_join(nvessel_weighted, by = c("Gear", "vssl_ln"))

FDI_AER_nvessel <- FDI_AER_nvessel %>%
                          mutate(
                            GVL_by_port = GVL * weight)


print(
  FDI_AER_nvessel %>% 
    ggplot()+
    geom_bar(aes(y = reorder(port,GVL_by_port), x = GVL_by_port, fill = vssl_ln), stat = "identity", position = "dodge")+
    scale_y_reordered() +
    theme_minimal()+
    ylab("")
)

Save data results

final_df = final_df %>% 
            select(MMSI,App_F_H, id,Gear,vssl_ln,port, lon_port, lat_port, geometry) %>% 
            rename(GFW_hours = App_F_H,
                   cscode = id,
                   vlength = vssl_ln)
            
write_sf(final_df, paste0(fd, "output_df_",Year_CS,".shp"))

2 Protocol - FDI landings (species kg and price) by port

The protocol disaggregate FDI landing data for a time series.

Step 1 - Open and subset FDI Landing data

Open FDI landing data

Year_CS = c(2014:2022)

Total landings coverage for the case study area - resulting from FDI data

#Set parameter for the map

world <- ne_countries(scale = "medium", returnclass = "sf", continent = "europe")
world = st_transform(world, crs = st_crs(GSA))

xmin = as.numeric(st_bbox(GSA)[1])-0.0001
xmax = as.numeric(st_bbox(GSA)[3])+0.0001
ymin = as.numeric(st_bbox(GSA)[2])-0.0001
ymax = as.numeric(st_bbox(GSA)[4])+0.0001


#landing

for(g in 1 : length(Gear_CS)){

l_data = purrr::map(landing_sf, ~ .x %>%
           filter(gear_type == Gear_CS[g]))
 
plot_list <- list() 
 
 for(y in 1: length(Year_CS)){
l_plot = l_data[[y]] %>% 
        group_by(year,gear_type,sub_region,cscode) %>% 
        summarise(tot_kg = sum(totwghtlandg), tot_value = sum(totvallandg)) %>% 
       ggplot()+
         geom_sf(aes(fill = tot_value))+
         geom_sf(data = world)+
         coord_sf(xlim = c(xmin, xmax), ylim = c(ymin, ymax))+
         scale_fill_viridis_c(option = "D")+ 
         ggtitle(paste0("FDI EU Landings_",Gear_CS[g],"_",Year_CS[y]))+
         theme_light()

plot_list[[y]] <- l_plot
 }
 combined_plot <- wrap_plots(plotlist = plot_list, ncol = 3)
 print(combined_plot)
 
 }

–> Remove first 3 years (2014-2015-2016) because there are few data

Save all the filtered data in a specific folder fd = “CaseStudy/Data/”

landing_sf =  landing_sf[!(names(landing_sf) %in% c("2014", "2015", "2016"))]

fd_price = "CaseStudy/Data_price/"

saveRDS(landing_sf, paste0(fd_price,"Landing_sf.RData"))

saveRDS(
  purrr::map(landing_sf, ~ .x %>% 
               st_drop_geometry() %>% 
              rename(id = cscode, gear = gear_type, vlength = vessel_length, tot_fish_weight = totwghtlandg,    tot_fish_value = totvallandg)),  
  paste0(fd,"landing_CS.rData"))

Step 2 - Filter and clean landing data

Take only species that have values in all years of the time series

landing_CS = readRDS(paste0(fd_price,"Landing_CS.RData"))

euro_species = purrr::map(
               landing_CS, ~ .x %>% 
               filter(tot_fish_weight != 0)  )

euro_species_df = do.call("rbind", euro_species)

species_all_years <- euro_species_df %>%
  group_by(gear,vlength,species) %>%  
  distinct(year, species) %>%
  summarise(n_years = n_distinct(year), .groups = "drop") %>%
  filter(n_years == length(unique(euro_species_df$year)))

euro_species_filtered <- euro_species_df %>%
                         inner_join(species_all_years, by = c("species", "vlength", "gear"))

# remove quarter

euro_species_filtered <- euro_species_filtered %>%  
                           group_by(year,vlength, gear, id,  species) %>% 
                           summarise(yearly_kg = mean(tot_fish_weight)*1000, yearly_value = mean(tot_fish_value))

In the time series check if there are some outlier.

df <- euro_species_filtered
df$year <- as.character(df$year)

vlent <- unique(df$vlength)

empty_cases <- list()


for (g in seq_along(Gear_CS)) {
  for (vl in seq_along(vlent)) {
    
    gear_filter <- Gear_CS[g]
    vlength_filter <- vlent[vl]
    
    q_data <- df %>%
      filter(gear == gear_filter, vlength == vlength_filter)
    
    if (nrow(q_data) == 0) next
    
    p_data <- q_data %>%
      group_by(year, species) %>%
      summarise(svalue = mean(yearly_value), .groups = "drop")
    
    zero_rows <- p_data %>% filter(svalue == 0)
    
    if (nrow(zero_rows) > 0) {
      for (i in seq_len(nrow(zero_rows))) {
        empty_cases[[length(empty_cases) + 1]] <- list(
          gear = gear_filter,
          vlength = vlength_filter,
          species = zero_rows$species[i],
          year = zero_rows$year[i]
        )
      }
    }

    p <- ggplot(p_data, aes(x = year, y = svalue, group = interaction(species))) +
      geom_point() +
      geom_line() +
      facet_wrap(~ species, scales = "free_y") +
      theme_minimal() +
      ggtitle(paste0("Species price time series for ", gear_filter, "_", vlength_filter)) +
      theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1))
    
    print(p)
    
    q <- ggplot(q_data) +
      geom_point(aes(x = yearly_kg, y = yearly_value, color = vlength)) +
      facet_wrap(~ species, scales = "free") +
      theme_minimal() +
      ggtitle(paste0("Species price scatter for ", gear_filter, "_", vlength_filter)) +
      theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1))
    
    print(q)
  }
}

empty_df <- do.call(rbind, lapply(empty_cases, as.data.frame))
print(empty_df)
##     gear vlength species year
## 1    OTB  VL1218     ARA 2017
## 2    OTB  VL1218     ARS 2017
## 3    OTB  VL1218     EDT 2017
## 4    OTB  VL1218     EOI 2017
## 5    OTB  VL1218     MTS 2017
## 6    OTB  VL1218     MUT 2017
## 7    OTB  VL1218     TGS 2017
## 8    OTB  VL1218     ARA 2018
## 9    OTB  VL1218     ARS 2018
## 10   OTB  VL1218     EDT 2018
## 11   OTB  VL1218     EOI 2018
## 12   OTB  VL1218     MTS 2018
## 13   OTB  VL1218     MUT 2018
## 14   OTB  VL1218     OCC 2018
## 15   OTB  VL1218     TGS 2018
## 16   OTB  VL1218     ARA 2019
## 17   OTB  VL1218     ARS 2019
## 18   OTB  VL1218     EDT 2019
## 19   OTB  VL1218     EOI 2019
## 20   OTB  VL1218     MTS 2019
## 21   OTB  VL1218     OCC 2019
## 22   OTB  VL1218     TGS 2019
## 23   OTB  VL1218     ARA 2020
## 24   OTB  VL1218     ARS 2020
## 25   OTB  VL1218     CTC 2020
## 26   OTB  VL1218     EDT 2020
## 27   OTB  VL1218     EOI 2020
## 28   OTB  VL1218     MTS 2020
## 29   OTB  VL1218     OCC 2020
## 30   OTB  VL1218     TGS 2020
## 31   OTB  VL1218     ARA 2021
## 32   OTB  VL1218     ARS 2021
## 33   OTB  VL1218     CTC 2021
## 34   OTB  VL1218     EDT 2021
## 35   OTB  VL1218     EOI 2021
## 36   OTB  VL1218     MTS 2021
## 37   OTB  VL1218     OCC 2021
## 38   OTB  VL1218     TGS 2021
## 39   OTB  VL1218     ARS 2022
## 40   OTB  VL1218     CTC 2022
## 41   OTB  VL1218     EDT 2022
## 42   OTB  VL1218     EOI 2022
## 43   OTB  VL1218     MTS 2022
## 44   OTB  VL1218     OCC 2022
## 45   OTB  VL1218     TGS 2022
## 46   OTB  VL1824     ANE 2017
## 47   OTB  VL1824     ARA 2017
## 48   OTB  VL1824     ARS 2017
## 49   OTB  VL1824     CTC 2017
## 50   OTB  VL1824     EDT 2017
## 51   OTB  VL1824     EOI 2017
## 52   OTB  VL1824     MTS 2017
## 53   OTB  VL1824     MUT 2017
## 54   OTB  VL1824     OCC 2017
## 55   OTB  VL1824     TGS 2017
## 56   OTB  VL1824     WHG 2017
## 57   OTB  VL1824     ANE 2018
## 58   OTB  VL1824     ARA 2018
## 59   OTB  VL1824     ARS 2018
## 60   OTB  VL1824     CTC 2018
## 61   OTB  VL1824     EDT 2018
## 62   OTB  VL1824     EOI 2018
## 63   OTB  VL1824     MTS 2018
## 64   OTB  VL1824     OCC 2018
## 65   OTB  VL1824     TGS 2018
## 66   OTB  VL1824     WHG 2018
## 67   OTB  VL1824     ANE 2019
## 68   OTB  VL1824     ARA 2019
## 69   OTB  VL1824     ARS 2019
## 70   OTB  VL1824     CTC 2019
## 71   OTB  VL1824     EDT 2019
## 72   OTB  VL1824     EOI 2019
## 73   OTB  VL1824     MTS 2019
## 74   OTB  VL1824     OCC 2019
## 75   OTB  VL1824     TGS 2019
## 76   OTB  VL1824     WHG 2019
## 77   OTB  VL1824     ANE 2020
## 78   OTB  VL1824     ARA 2020
## 79   OTB  VL1824     ARS 2020
## 80   OTB  VL1824     CTC 2020
## 81   OTB  VL1824     EDT 2020
## 82   OTB  VL1824     EOI 2020
## 83   OTB  VL1824     MTS 2020
## 84   OTB  VL1824     OCC 2020
## 85   OTB  VL1824     TGS 2020
## 86   OTB  VL1824     WHG 2020
## 87   OTB  VL1824     ANE 2021
## 88   OTB  VL1824     ARA 2021
## 89   OTB  VL1824     ARS 2021
## 90   OTB  VL1824     CTC 2021
## 91   OTB  VL1824     EDT 2021
## 92   OTB  VL1824     EOI 2021
## 93   OTB  VL1824     HOM 2021
## 94   OTB  VL1824     MTS 2021
## 95   OTB  VL1824     MUR 2021
## 96   OTB  VL1824     MUT 2021
## 97   OTB  VL1824     OCC 2021
## 98   OTB  VL1824     TGS 2021
## 99   OTB  VL1824     WHG 2021
## 100  OTB  VL1824     ANE 2022
## 101  OTB  VL1824     ARA 2022
## 102  OTB  VL1824     ARS 2022
## 103  OTB  VL1824     CTC 2022
## 104  OTB  VL1824     DPS 2022
## 105  OTB  VL1824     EDT 2022
## 106  OTB  VL1824     EOI 2022
## 107  OTB  VL1824     HKE 2022
## 108  OTB  VL1824     HOM 2022
## 109  OTB  VL1824     MTS 2022
## 110  OTB  VL1824     MUR 2022
## 111  OTB  VL1824     MUT 2022
## 112  OTB  VL1824     NEP 2022
## 113  OTB  VL1824     OCC 2022
## 114  OTB  VL1824     TGS 2022
## 115  OTB  VL1824     WHG 2022
## 116  OTB  VL2440     ANE 2017
## 117  OTB  VL2440     ARA 2017
## 118  OTB  VL2440     ARS 2017
## 119  OTB  VL2440     EOI 2017
## 120  OTB  VL2440     MAC 2017
## 121  OTB  VL2440     MTS 2017
## 122  OTB  VL2440     MUT 2017
## 123  OTB  VL2440     OCC 2017
## 124  OTB  VL2440     PIL 2017
## 125  OTB  VL2440     SPC 2017
## 126  OTB  VL2440     ARA 2018
## 127  OTB  VL2440     ARS 2018
## 128  OTB  VL2440     EOI 2018
## 129  OTB  VL2440     MTS 2018
## 130  OTB  VL2440     OCC 2018
## 131  OTB  VL2440     ARA 2019
## 132  OTB  VL2440     ARS 2019
## 133  OTB  VL2440     EOI 2019
## 134  OTB  VL2440     MTS 2019
## 135  OTB  VL2440     OCC 2019
## 136  OTB  VL2440     ARA 2020
## 137  OTB  VL2440     ARS 2020
## 138  OTB  VL2440     DPS 2020
## 139  OTB  VL2440     EOI 2020
## 140  OTB  VL2440     MTS 2020
## 141  OTB  VL2440     OCC 2020
## 142  OTB  VL2440     CTC 2021
## 143  OTB  VL2440     EOI 2021
## 144  OTB  VL2440     MTS 2021
## 145  OTB  VL2440     OCC 2021
## 146  OTB  VL2440     ARS 2022
## 147  OTB  VL2440     CTC 2022
## 148  OTB  VL2440     DPS 2022
## 149  OTB  VL2440     EOI 2022
## 150  OTB  VL2440     MTS 2022
## 151  OTB  VL2440     OCC 2022

Some species have zero value for all time series for specific gear and length combinations, so we have decided to remove them.

empty_df = empty_df %>%
  distinct(gear, vlength, species, year) %>%
  group_by(gear, vlength, species) %>%
  summarise(n_zero_years = n(), .groups = "drop") %>% 
  filter(n_zero_years == length(unique(euro_species_filtered$year)))

remove_keys <- paste(empty_df$gear, empty_df$vlength, empty_df$species, sep = "_")

euro_species_clean <- euro_species_filtered %>%
                      filter(!paste(gear, vlength, species, sep = "_") %in% remove_keys)

For other species, it is evident that not all years had values. However, it is notable that there is a positive correlation with kilograms. The issue under discussion is resolved by calculating the linear relationship between value and kilograms by species. Subsequently, the value of the relationship is associated with the missing points in the time series.

In the case that the relationship between value and kg is non-linear, the missing value was replaced with linear regression without vessel length.

df_full <- euro_species_clean
df_model <- df_full[df_full$yearly_value != 0, ]

non_significant_models <- list()

gear_list <- unique(df_model$gear)
vlength_list <- unique(df_model$vlength)

for (g in seq_along(gear_list)) {
  for (vl in seq_along(vlength_list)) {
    
    gear_value <- gear_list[g]
    vlen_value <- vlength_list[vl]
    
    subset_model <- df_model[df_model$gear == gear_value & 
                             df_model$vlength == vlen_value, ]
    
    species_list <- unique(subset_model$species)
    
    for (s in seq_along(species_list)) {
      
      species_value <- species_list[s]
      
      subset_species_model <- subset_model[subset_model$species == species_value, ]
      
      subset_zero <- df_full$gear == gear_value & 
                     df_full$vlength == vlen_value & 
                     df_full$species == species_value & 
                     df_full$yearly_value == 0
      
      if (nrow(subset_species_model) >= 2) {
        
        model <- lm(yearly_value ~ yearly_kg, data = subset_species_model)
        model_summary <- summary(model)
        fstat <- model_summary$fstatistic
        r_squared <- model_summary$r.squared
        pval <- pf(fstat[1], fstat[2], fstat[3], lower.tail = FALSE)
        
        if (!is.na(pval) && pval < 0.05 && r_squared > 0.4 && any(subset_zero)) {
          # Previsione e sostituzione
          df_full[subset_zero, "yearly_value"] <- predict(model, newdata = df_full[subset_zero, ])
        } else {
          # Modello non significativo: salva
          non_significant_models[[length(non_significant_models) + 1]] <- data.frame(
            species = species_value,
            gear = gear_value,
            vlength = vlen_value,
            p_value = ifelse(is.na(pval), NA, round(pval, 5)),
            r_squared = round(r_squared, 3)
          )
        }
      }
    }
  }
}

if (length(non_significant_models) > 0) {
  non_significant_df <- do.call(rbind, non_significant_models)
  print(non_significant_df)
} else {
  message("All models were significant.")
}
##         species gear vlength p_value r_squared
## value       HKE  OTB  VL1218 0.00054     0.301
## value1      ILL  OTB  VL1218 0.00000     0.969
## value2      MNZ  OTB  VL1218 0.00000     0.986
## value3      OCC  OTB  VL1218      NA     1.000
## value4      OCT  OTB  VL1218 0.00000     0.980
## value5      RJC  OTB  VL1218 0.00000     0.974
## value6      SCS  OTB  VL1218 0.00000     0.976
## value7      SQZ  OTB  VL1218 0.00000     0.905
## value8      WHB  OTB  VL1218 0.00000     0.940
## value9      ARY  OTB  VL1218 0.00000     0.903
## value10     CTC  OTB  VL1218 0.13643     0.190
## value11     DAB  OTB  VL1218 0.00000     0.982
## value12     JOD  OTB  VL1218 0.00000     0.763
## value13     MUR  OTB  VL1218 0.59223     0.009
## value14     PAC  OTB  VL1218 0.00000     0.946
## value15     ROL  OTB  VL1218 0.17240     0.059
## value16     SBR  OTB  VL1218 0.00000     0.858
## value17     WEG  OTB  VL1218 0.00000     0.942
## value18     DPS  OTB  VL1824 0.28945     0.045
## value19     HKE  OTB  VL1824 0.24448     0.052
## value20     NEP  OTB  VL1824 0.50660     0.017
## value21     MUT  OTB  VL1824 0.52300     0.032
## value22     ARY  OTB  VL2440 0.00000     0.956
## value23     CTC  OTB  VL2440 0.29481     0.180
## value24     DPS  OTB  VL2440 0.43113     0.052
## value25     HKE  OTB  VL2440 0.78938     0.002
## value26     ILL  OTB  VL2440 0.00000     0.999
## value27     JOD  OTB  VL2440 0.00000     0.982
## value28     MNZ  OTB  VL2440 0.00000     0.990
## value29     MUR  OTB  VL2440 0.71690     0.004
## value30     NEP  OTB  VL2440 0.75014     0.002
## value31     OCT  OTB  VL2440 0.00000     0.993
## value32     PAC  OTB  VL2440 0.00000     0.953
## value33     SQZ  OTB  VL2440 0.00000     0.942
## value34     SBA  OTB  VL2440 0.00000     0.877
## value35     SYC  OTB  VL2440 0.00000     0.975
## value36     MUT  OTB  VL2440 0.74592     0.005
## value37     ARA  OTB  VL2440 0.68407     0.022

In the case that the relationship between value and kg for [gear-vessel length-species] is non-linear, the missing value was replaced with linear regression for [gear-species].

keys_nosig <- paste(non_significant_df$gear, non_significant_df$species, sep = "_")

df_full %>% 
  filter(paste(gear, species, sep = "_") %in% keys_nosig) %>% 
  ggplot()+
  geom_point(aes(x = yearly_kg, y = yearly_value))+
  facet_wrap(~gear+species, scales = "free")+
  theme_minimal()+
  ggtitle("Find correlation [Species-Gear] for no significant combination [Species-Vlength-Gear]")

keys_nosig <- paste(non_significant_df$gear, non_significant_df$vlength, non_significant_df$species, sep = "_")

df_model_gs <- df_full %>% 
  filter(paste(gear, vlength, species, sep = "_") %in% keys_nosig) %>% 
  filter(yearly_value != 0)

non_significant_models <- list()

gear_list <- unique(df_model_gs$gear)

for (g in seq_along(gear_list)) {
  
  gear_value <- gear_list[g]
  subset_model <- df_model_gs[df_model_gs$gear == gear_value, ]
  
  species_list <- unique(subset_model$species)
    
    for (s in seq_along(species_list)) {
      
      species_value <- species_list[s]
      subset_species_model <- subset_model[subset_model$species == species_value, ]
      
      if (nrow(subset_species_model) >= 2) {
        
        model <- lm(yearly_value ~ yearly_kg, data = subset_species_model)
        fstat <- summary(model)$fstatistic
        pval <- pf(fstat[1], fstat[2], fstat[3], lower.tail = FALSE)
        
        if (!is.na(pval) && pval <= 0.05) {
          
          rows_to_update <- which(
            paste(df_full$gear, df_full$vlength, df_full$species, sep = "_") %in% keys_nosig &
            df_full$gear == gear_value &
            df_full$species == species_value &
            df_full$yearly_value == 0
          )
          
          if (length(rows_to_update) > 0) {
            predicted <- predict(model, newdata = df_full[rows_to_update, ])
            df_full$yearly_value[rows_to_update] <- predicted
          }
          
        } else {
          non_significant_models[[length(non_significant_models) + 1]] <- data.frame(
            gear = gear_value,
            species = species_value,
            p_value = ifelse(is.na(pval), NA, round(pval, 5))
          )
        }
      }
    }
  }


if (length(non_significant_models) > 0) {
  non_significant_df <- do.call(rbind, non_significant_models)
  print(non_significant_df)
} else {
  message("All models were significant.")
}
##        gear species p_value
## value   OTB     HKE 0.37703
## value1  OTB     OCC      NA
## value2  OTB     CTC 0.50123
## value3  OTB     MUR 0.55016
## value4  OTB     ROL 0.17240
## value5  OTB     DPS 0.17565
## value6  OTB     NEP 0.45257
## value7  OTB     MUT 0.51800
## value8  OTB     ARA 0.68407
for (g in seq_along(Gear_CS)) {

  p_m = df_full %>% 
  filter(gear == Gear_CS[g]) %>% 
    filter(yearly_value > 0) %>% 
  ggplot()+
  geom_point(aes(x = yearly_kg, y = yearly_value, color = vlength))+
  facet_wrap(~vlength+species, scales = "free")+
  theme_minimal()+
  ggtitle("Positive value in the time series after cleaning")
  
  print(p_m)
}

Now we are ready for disaggregate data by landing port. First we remove remaining zero value and we calculate price as: value/kg

df_full %>% 
   group_by(species, gear) %>%
   summarise(mvalue = mean(yearly_value), kg = (mean(yearly_kg))) %>%
  ggplot() +
  geom_bar(aes(x = kg, y = reorder_within(species, kg, gear), fill = gear), stat = "identity")+
  ylab("FAO species code")+
  facet_wrap(~ gear, scales = "free")+
  xlab("kg")+
  ggtitle("List of all landed species for the CS")+
  scale_y_reordered()+
  theme_minimal()
## `summarise()` has grouped output by 'species'. You can override using the
## `.groups` argument.

species_sel <- df_full %>%
  group_by(species, gear) %>%
  summarise(m_kg = mean(yearly_kg), .groups = "drop") %>%
  group_by(gear) %>%
  slice_max(m_kg, n = 20) %>%
  ungroup()  

 FDI_land_spe_filter <- df_full %>%
                    inner_join(species_sel, by = c("species", "gear")) %>% 
                    select(-m_kg) %>% 
                    mutate(price = yearly_value/yearly_kg)

Some results

FAO_sp = read.csv(paste0(wd,"ASFIS_sp_2025.csv")) %>% select("Alpha3_Code","Scientific_Name") %>% rename("species" = "Alpha3_Code")

FDI_land_spe_filter$year = as.character(FDI_land_spe_filter$year)

gear_filter = unique(FDI_land_spe_filter$gear)

for (g in seq_along(gear_filter)) {

  print(FDI_land_spe_filter %>% 
    filter(gear == gear_filter[g]) %>% 
  group_by(year, vlength, species) %>% 
  summarise(m_price = mean(price)) %>% 
  left_join(FAO_sp) %>%
  mutate(sp_id = paste0(Scientific_Name," (",species,")")) %>% 
  
        ggplot(aes(x = year, y = m_price, color = vlength,
                   group = interaction(species, vlength)
                   
                   )) +
        geom_line(size = 1.1) +
        geom_point() +
        facet_wrap(~ sp_id, scales = "free", ncol = 5) +
        theme_minimal() +
        ggtitle(paste0("Species price time series for ", gear_filter[g])) +
        ylab("Species price")+
        theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1))
)
  }
## `summarise()` has grouped output by 'year', 'vlength'. You can override using
## the `.groups` argument.
## Joining with `by = join_by(species)`
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

for (g in seq_along(gear_filter)) {
print(
  FDI_land_spe_filter %>%
  filter(gear == gear_filter[g]) %>% 
   group_by(vlength,species) %>%
   summarise(kg_gear = sum(yearly_kg)) %>%
  group_by(species) %>% 
  mutate(kg_tot = sum(kg_gear)) %>% 
  mutate(kg_prop = (kg_gear/kg_tot)*100) %>%
  left_join(FAO_sp) %>%
  mutate(sp_id = paste0(Scientific_Name," (",species,")")) %>% 
  ggplot() +
  geom_bar(aes(y = sp_id, x = kg_prop, fill = vlength),stat = "identity") +
  labs(title = "Kg of landing by vlength", y = "", x = "Kg prop") +
  theme_minimal()+
  theme(axis.text.y = element_text(face = "italic")))
}
## `summarise()` has grouped output by 'vlength'. You can override using the
## `.groups` argument.
## Joining with `by = join_by(species)`

Save data filtered

 saveRDS(FDI_land_spe_filter, paste0(fd_price,"FDI_land_spe_filter.RData"))

Step 3 - Calculate and add price by species

Load dataset containing the number of vessels by port, type of fishing gear used, and vessel length. Load port coordinates.

Here we first chech gear code used and change as FDI code.

nvessel_port =  read_sf(paste0(fd,"output_df_2021.shp")) %>% 
                st_drop_geometry() %>% 
                select(MMSI, vlength, port, Gear) %>% 
                mutate(year = "2021") %>% 
                unique()
nvessel_port <- nvessel_port %>%
                group_by(year, Gear, vlength, port) %>%
                summarise(n_vessel_port = n())
## `summarise()` has grouped output by 'year', 'Gear', 'vlength'. You can override
## using the `.groups` argument.
nvessel_weighted <- nvessel_port %>%
                     group_by(year, Gear, port) %>% 
                     mutate(weight = n_vessel_port / sum(n_vessel_port)) %>% 
                     rename(gear = Gear)

FDI_land_spe_filter = readRDS(paste0(fd_price,"FDI_land_spe_filter.RData")) %>% 
                      group_by(year, gear, vlength, species) %>% 
                      summarise(mval = mean(yearly_value), mkg = mean(yearly_kg), mprice = mean(price))  
## `summarise()` has grouped output by 'year', 'gear', 'vlength'. You can override
## using the `.groups` argument.
FDI_land_spe_nvessel <- FDI_land_spe_filter %>%
                         inner_join(nvessel_weighted, by = c("year", "gear", "vlength"))
## Warning in inner_join(., nvessel_weighted, by = c("year", "gear", "vlength")): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 149 of `x` matches multiple rows in `y`.
## ℹ Row 1 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
##   "many-to-many"` to silence this warning.
FDI_land_spe_nvessel <- FDI_land_spe_nvessel %>%
                          mutate(
                            mval_by_port = mval * weight,
                            mkg_by_port = mkg * weight,
                            mprice_by_port = mprice * weight)

df_port_species <- FDI_land_spe_nvessel %>%
                    group_by(year, gear, vlength, port, species) %>%
                    summarise(
                      mval = sum(mval_by_port, na.rm = TRUE),
                      mkg  = sum(mkg_by_port, na.rm = TRUE),
                      mprice = sum(mprice_by_port)) 
## `summarise()` has grouped output by 'year', 'gear', 'vlength', 'port'. You can
## override using the `.groups` argument.
df_port_species %>% 
  group_by(species, gear, vlength) %>% 
  summarise(mean_price = mean(mprice)) %>% 
  
ggplot( aes(y = reorder_within(species, mean_price, interaction(gear, vlength)), x = mean_price, fill = species)) +
  geom_bar(stat = "identity", position = "dodge") +
  facet_wrap(~ gear + vlength, scales = "free", ncol = 4) +
  theme_minimal() +
  scale_y_reordered() +
  theme(legend.position = "none")+
  labs(title = "Average price by species and port", x = "Mean price (€)", y = "Species")
## `summarise()` has grouped output by 'species', 'gear'. You can override using
## the `.groups` argument.

Conclusion

Overall, the disaggregation protocol developed under WP2 provides a replicable and adaptable framework to enhance the spatial granularity and analytical potential of fisheries socio-economic data. While subject to certain limitations—chiefly related to data availability and resolution—the approach demonstrates the feasibility of integrating heterogeneous data sources (FDI, AER, GFW, and registries) to derive vessel-level and port-level indicators. This not only improves the precision of fisheries monitoring and assessment, but also strengthens the foundational data layer required for advanced socio-ecological modeling within the Digital Twin of the Ocean. Future work will focus on validating this methodology across multiple regions and fleet segments, with the goal of operationalizing it within broader simulation workflows and policy support tools.